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Abstract 



A data integration system provides transparent access to different data sources by suitably combining 
their data, and providing the user with a unified view of them, called global schema. However, source 

04 ' data are generally not under the control of the data integration process, thus integrated data may vio- 

^ ' late global integrity constraints even in presence of locally-consistent data sources. In this scenario, it 

>— ' . may be anyway interesting to retrieve as much consistent information as possible. The process of an- 

swering user queries under global constraint violations is called consistent query answering (CQA). 
Several notions of CQA have been proposed, e.g., depending on whether integrated information is 
assumed to be sound, complete, exact or a variant of them. This paper provides a contribution in 

r ■ ' this setting: it uniforms solutions coming from different perspectives under a common ASP-based 

>--^ , core, and provides query-driven optimizations designed for isolating and eliminating inefficiencies 

of the general approach for computing consistent answers. Moreover, the paper introduces some new 
theoretical results enriching existing knowledge on decidability and complexity of the considered 
problems. The effectiveness of the approach is evidenced by experimental results. 

ij - To appear in Theory and Practice of Logic Programming (TPLP). 
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1 Introduction 

The enormous amount of information dispersed over many data sources, often stored in 
different heterogeneous databases, has recently boosted the interest for data integration 
systems jLenzerini 2002l l. Roughly speaking, a data integration system provides transpar- 
ent access to different data sources by suitably combining their data, and providing the 
user with a unified view of them, called global schema. In many cases, the application 
domain imposes some consistency requirements on integrated data. For instance, it may 
be at least desirable to impose some integrity constraints (ICs), like primary/foreign keys, 
on the global relations. It may be the case that data stored at the sources may violate 
global ICs when integrated, since in general data sources are not under the control of the 
data integration process. The standard approach to this problem basically consists of ex- 
plicitly modifying the data in order to eliminate IC violations (data cleaning). However, 
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the explicit repair of data is not always convenient or possible. Therefore, when answer- 
ing a user query, the system should be able to "virtually repair" relevant data (in the line 
of [Arenas et al. 20031 IBertossi et al. 20051 IChomicki and Marcinkowski 20051) . in order to 
provide consistent answers; this task is also called Consistent Query Answering (CQA). 

The database community has spent considerable efforts in this area, relevant research 
results have been obtained to clarify semantics, decidability, and complexity of data- 
integration under constraints and, specifically, for CQA. In particular, several notions of 
CQA have been proposed (see lBertossi et al. 2005l for a survey), e.g. depending on whether 
the information in the database is assumed to be sound, complete or exact. However, while 
efficient systems are already available for simple data integration scenarios, solutions be- 
ing both scalable and comprehensive have not been implemented yet for CQA, mainly 
due to the fact that handling inconsistencies arising from constraints violation is inherently 
hard. Moreover, mixing different kinds of constraints (e.g. denial constraints, and inclu- 
sion dependencies) on the same global database makes, often, the query answering process 
undecidable (lAbiteboul et al. 19951 ICali et al. 2003al l. 

This paper provides some contributions in this setting. Specifically, it first starts from dif- 
ferent state-of-the-art semantic perspectives jArenas et al. 2003llCali et al. 2003a[|Chomicki and Marcinkowski 20051) 
and revisits them in order to provide a uniform, common core based on Answer Set Pro- 
gramming (ASP) jGelfond and Lifschitz 1 9881 IGelfond and Lifschitz 199 It . Thus, it pro- 
vides query driven optimizations, in the light of the experience we gained in the IN- 
FOMIX jLeone et al. 2005i l project in order to overcome the limitations observed in real- 
world scenarios. The main contributions of this paper can be summarized in: 

• A theoretical analysis of considered semantics which extends previous results. 

• The definition of a unified framework for CQA based on a purely declarative, logic 
based approach which supports the most relevant semantics assumptions on source 
data. Specifically, the problem of consistent query answering is reduced to cautious 
reasoning on (disjunctive) ASP programs with aggregates jFaber et al. 20101 ) auto- 
matically built from both the query and involved constraints. 

• The definition of an optimization approach designed to (1) "localize" and limit the 
inefficient part of the computation of consistent answers to small fragments of the 
input, (2) cast down the computational complexity of the repair process if possible. 

• The implementation of the entire framework in a full fledged prototype system. 

• The capability of handling large amounts of data, typical of real- world data integra- 
tion scenarios, using as internal query evaluator the DLV^^ jTerracina et al. 20081 ) sys- 
tem; indeed, DLV^^ allows for mass-memory database evaluations and distributed 
data management features. 

In order to assess the effectiveness of the proposed approach, we carried out experimen- 
tal activities both on a real world scenario and on synthetic data, comparing its behavior 
on different semantics and constraints. 

The plan of the paper is as follows. Section |2] formally introduces the notion of CQA 
under different semantics and some new theoretical results on decidability and complexity 
for this problem. Section [3] first introduces a unified (general) solution to handle CQA via 
ASP, and then presents some optimizations. Section|4]describes the benchmark framework 
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we adopted in the tests and discusses on obtained results. Finally, Section |5] compares 
related work and draws some conclusive considerations. 



2 Data Integration Framework 

In this paper we exploit the data integration setting to point out motivations and challenges 
underlying CQA. However, as it will be clarified in the following, techniques and results 
provided in the paper hold also for a single database setting. We next formally describe the 
adopted data integration framework. 

The following notation will be used throughout the paper We always denote by F a 
countably infinite domain of totally ordered values; by i a tuple of values from F; by X a 
variable; by x a sequence Xi, . . . , X„ of (not necessarily distinct) variables, and by |x| = n 
its length. Let x, x' be two sequences of variables, we denote by x— x' the sequence obtained 
from X by discarding a variable if it appears in x'. Whenever all the variables of sequence 
X appear in another sequence x', we simply write x < x'. Given a sequence x and a set 
TT C {1, . . . , |x|}, we denote by x'^ the sequence obtained from x by discarding a variable 
if its position is not in tt. (Similarly, given a tuple t and a set tt C {1, . . . , |i|}, we denote 
by t^ the tuple obtained from t by discarding a value if its position is not in tt.) Moreover, 
we denote, by a{x) a conjunction of comparison atoms of the form X X', where G 
{<,>,<,>, 7^}, and by 0, the symmetric difference operator between two sets. 

A relational database schema is a pair 7?, = {names (TZ), constr{TZ)) where names (TZ) 
and constr{TZ) are the relation names and the integrity constraints (ICs) ofTZ, respectively. 
The arity of a given relation r £ names{TV) is denoted by arity{r). A database (instance) 
for TZ is any set of facts ( lAbiteboul et al. 1995l l of the form: 

J' = {r{t) : r G names{TZ) A t is a tuple from F A \t\ = arity{r)} 

In the following, we adopt the unique name assumption, and dom(7^) denotes the subset 
of F containing all the values appearing in the facts of T. 

Let ri, . . . , Tm G names{TZ), the set constr{TZ) contains ICs of the form: 

1. Vxi, . . . ,x,„ ^[ ri(xi) A. . . Arm(xj„) Ao-(xi, . . . ,Xm) ] {denial constraints -DCs) 

2. Vxv [ ?'i(xi) — > 3x23 ''2(x2) ] {inclusion dependencies - INDs); 

where arity{ri) = |xi|, for each i in [L.m]. In particular, for INDs we require that all the 
variables within an x^ (1 < i < 2) are distinct, xy < Xi, xy < X2, and X2g = X2 — xy. Note 
that, if |x23| — 0, then xy = X2 < xi. In the case we are only interested in emphasizing 
the relation names involved in an IND, we simply write ri(xi) — > r2(x2) or ri — >■ r2. A 
database T is said to be consistent w.rt. TZ if all ICs are satisfied. A conjunctive query 
C(7(x) over 7?. is a formula of the form 

3xi3, . . . ,x„3 ri(xi) A ... A r„(x„) A cr(xi . . . ,x„) 

where Xig < x^ for each i in [L.m], w = Xi— X13, . . . , x^— x^g are the free variables of q, 
and X contains only and all the variables of w (with no duplicates, and possibly in different 
order). A union of conjunctive queries q{x) is a formula of the form cqi (x) V ... V cqn (x). 
In the following, for simplicity, the term query refers to a union of conjunctive queries, if 
not differently specified. Given a database T for 7^, and a query g(x), the answer to q is 
the set of n-tuples of values ans{q,T) — {t : T \= lit)}- 
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2.1 The Data Integration Model 

A data integration system is formalized ( ILenzerini 20021 ) as a triple I = (Q, S, Ai) where 

. Q is the global schema. A global database for X is any database for Q; 
. S is the source schema. A source database for I is any database consistent w.r.t. iS; 
. A4 is the global-as-view (GAV) mapping, that associates each element g in names (Q) 
with a union of conjunctive queries over S. 

Let J^ be a source database for I. The retrieved global database is 

ret{I,T) = {g{t) : g G names{Q) A t £ ans{q,T) A q E M{g)} 

for Q satisfying the mapping. Note that, when source data are combined in a unified schema 
with its own ICs, the retrieved global database might be inconsistent. 

In the following, when it is clear from the context, we use simply the symbol V to 
denote the retrieved global database ret{I,T). In fact, all results provided in the paper 
hold for any database V complying with some schema Q but possibly inconsistent w.r.t. 
the constraints of Q. 

Example 1 

Consider a bank association that desires to unify the databases of two branches. The first 
(source) database models managers by using a relation man{code, name) and employ- 
ees by a relation emp{code, name), where code is a primary key for both tables. The 
second database stores the same data in a relation employee {code, name, role). Suppose 
that the data have to be integrated under a global schema with two relations m{code) and 
e{code, name), where the global ICs are: 

• VXi, X2, X3 -.[e(Xi, X2) A e(Xi, X3) A X2 y^ X3] namely, code is the key of e; 

• \/Xi[m{Xi) -^ 3X2 e{Xi,X2)] i.e., an IND imposing that each manager code must 
be an employee code as well. 

The mapping is defined by the following Catalog rules (as usual, see lAbiteboul et al. I995I ): 

e{Xc,Xn) :- emp{Xc,Xn)- m{Xc) :- man{Xc,-)- 

e{Xc,Xn) :— employ ee (X c, Xn, -)■ m(Xc) :— employee(Xc, -,^m,anager')- 

Assume that, emp stores tuples ('el', 'John'), ('e2','mary'), ('e3', 'willy'), man stores 
('el', 'John'), and employee stores ('el', 'ann', 'manager'), ('e2' ,'mary' , 'manager'), ('e3', 
'rose ', 'emp '). It is easy to verify that, although the source databases are consistent w.r.t. 
local constraints, the global database, obtained by evaluating the mapping, violates the key 
constraint on e as both John and ann have the same code el, and both willy and rose have 
the same code e3 in table e. D 



2.2 Consistent Query Answering under different semantics 

In case a database V violates ICs, one can still be interested in querying the "consistent" 
information originating from T. One possibility is to "repair" V (by inserting or deleting 
tuples) in such a way that all the ICs are satisfied. But there are several ways to "repair" T>. 
As an example, in order to satisfy an IND of the form ri — > r2 one might either remove 
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violating tuples from ri or insert new tuples in r2. Moreover, the repairing strategy de- 
pends on the particular semantic assumption made on the data integration system. Semantic 
assumptions may range from (strict) soundness to (strict) completeness. Roughly speak- 
ing, completeness complies with the closed world assumption where missing facts are 
assumed to be false; on the contrary, soundness complies with the open world assumption 
where V may be incomplete. We next define consistent query answering under some rel- 
evant semantics, namely loosely-exact, loosely-sound, CM-complete jArenas et al. 20031 
ICali et al. 2003at IChomicki and Marcinkowski 2005l l. More formally, let E denote a se- 
mantics, and V a possibly inconsistent database for Q, a database B is said to be a 'S-repair 
for T> if it is consistent w.r.t. Q and one of the following conditions holds: 

1. E = CM-complete, B CV, and $B' CV such that B' is consistent and B' D B; 

2. E = loosely-sound and $ B' such that B' is consistent and S' n 2? D S fl P; 

3. E = loosely -exact, and '^ B' such that B' is consistent and B' QV C B QV. 

The CM-complete semantics allows a minimal number of deletions in each repair to 
avoid empty repairs, if possible, but does not allow insertions. The loosely-sound seman- 
tics allows insertions and a minimal amount of deletions. Finally, the loosely-exact se- 
mantics allows both insertions and deletions by minimization of the symmetric difference 
between V and the repairs. 

Definition 1 

Let 2? be a database for a schema Q, and E be a semantics. The consistent answer to a 
query q w.r.t. V, is the set ansY,{q, Gj'D) ^ {t : t E ans{q, B) for each Yj-repairB forV} 
Consistent Query Answering (CQA) is the problem of computing ans^{q, Q, T>). O 

Observe that other semantics have been considered in the literature, like sound, com- 
plete, exact, loosely-complete, etc. JCali et al. 2003 ak however, some of them are trivial 
for CQA; as an example, in the exact semantics CQA makes sense only if the retrieved 
database is already consistent with the global constraints, whereas in the complete and 
loosely-complete semantics CQA will always return a void answer. Note that, the seman- 
tics considered in this paper address a wide significant range of ways to repair the retrieved 
database which are also relevant for CQA. 

Example 2 

By following Example[T] the retrieved global database admits exactly the following repairs 

under the CM-complete semantics: 

Bi = {eCe2','mary'), eCer,'john'), eCe3', 'willy'), mCel'), mCe2')} 
B2 = {eCe2','mary'), eCer,'john'), eCe3','rose'), mCel'), mCe2')} 
Bs = {eCe2','mary'), e('ei', 'arm'), eCe3', 'willy'), m('el'), ■m('e2')} 
B4 = {e('e2','mary'), e('er,'ann'), e('e3','rose'), m.('el'), m('e2')} 

Query ■m{X) asking for the list of manager codes has then both el and e2 as consistent an- 
swers, whereas the query e{X, Y) asking for the list of employees has only e('e2', 'mary') 
as consistent answer (e is the only tuple in each CM-complete repair). D 
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2.3 Restricted Classes of Integrity Constraints 

The problem of computing CQA, under general combinations of ICs, is undecidable jAbiteboul et al. 19951 1. 
However, restrictions on ICs to retain decidability and identify tractable cases can be im- 
posed. 

Definition 2 

Let r be a relation name of arity n, and tt be a set of m < n indices from / = {!,..., n]. 
A key dependency (KD) for r consists of a set of n — m DCs, exactly one for each index 
j G / — TT, of the form Vxi,X2 -i(r(xi) A r(x2) A x^^ 7^ ^2) where no variable occurs 
twice in each x^ (1 < i < 2), |xi | = |x2 1 = Ji, the sequence x^ exactly coincides with 
X2 , and x^ is distinct from X2 for each 7 G / — tt. The set tt is called the primary-key of 
r and is denoted by key(r). We assume that at most one KD is specified for each relation 
(ICali et al. 2003al l. Finally, for each relation name r' such that no DC is expUcitly specified 
for, we say, without loss of generality, that key{r') = {!,..., arity {r')}. D 

Definition 3 

Given an inclusion dependency d of the form Vxy [ ri(xi) — > 3x23 ?'2(x2) ], we denote by 
T^L — {Ij • ■ • 7 '^^*^2/('~i)} ^nd TT^ C {!,..., arity {r2)} the two sets of indices induced by 
the positions of the variables xy in Xi and X2, respectively. More formally, nf = {i : x\ is 
universally quantified in d} and tt^ = {i : Xj is universally quantiHed in d}. O 

For example, let d denote the IND VXi, X2 [ ri(Zi, X3, X2) -^ BX^ r2{Xi, X2, Xi) ]. 
We have that nf = {1, 3} and tt^ = {2, 3}. 

Definition 4 

An IND d is said to be 

• a foreign key (FK) if tt^ = key{r2) (lAbiteboul et al. 1995] l; 

• a foreign superkey (FSK) if tt^ D key{r2) JLevene and Vincent 2000l l: 

• non-key-conBicting (NKC) if tt'^ ^ key{r2) JCali et al. 2003al) . D 

Definition 5 

An FSK d of the form ri — ;> 7-2 is said to be safe (SFSK) if tt^ C fee j/(ri). In particular, if 

rf is a safe FK we call it an SFK. D 

For example, let d denote the FSK VXi, X2 [ ri(Xi, X3, X2) -^ 2X4 r2{X4, X2, Xi) ] 
where key{r2) = {3}. Thus, if key (ri) = {1, 3}, d is SFSK, whereas if key (ri) = {1, 2}, 
d is not SFSK. 

Table[T]summarizes known and new results about computability and complexity of CQA 
under relevant classes of ICs and the three semantic assumptions considered in this paper. 
In particular, given a query q (without comparison atoms if E G {loosely- sound, loosely- 
exact}), we refer to the decision problem of establishing whether a tuple from dom {V) be- 
longs to ans^{q, Q, V) or not. Note that, |Chomicki and Marcinkowski (2005| ) have proved 
computability and complexity of CQA for the CM-complete semantics in case of conjunc- 
tive queries with comparison predicates. However, since in such a setting there is a finite 
number of repairs each of finite size, then their results straightforwardly hold for union of 
conjunctive queries as well. New decidability and complexity results for CQA under KDs 
and SFSKs only, with E G {loosely- sound, loosely-exact} are proved in Section l2~4l 
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Table 1 . Data Complexity of CQA (distinguishing between cyclic/acyclic INDs) 



DCs 


INDs 


loosely-sound 


loosely-exact 


CM-cojnpie(e 


no 


any 


in PTIME <^^ 


in PTIME ^^> 


in PTIME *^^ 


KD 


no 


coNP-c (^' 


coNP-c (1) 


coNP-c (^' 


KD 


NKC 


coNP-c (^' 


n^c (1) 


in n^ (2) / in coNP t^) 


KD 


SFSK 


inng(3) 


innp(3) 


in n^ (2) / in coNP (2) 


KD 


any 


undec. '^-^ 


undec. '■^^ 


in nf (2) / in coNP '^> 


any 


any 


undec. '"'•' 


undec. '■'*^ 


nf-c (2) / coNP-c (2' 



(1) ICali et al. 2003al (^) IChomicki and Marcinkowski 20051 (3) Section[Z4l t"^) lAbiteboul et al. 19951 



2.4 Loosely-exact and Loosely-sound semantics under KD and SFSK 

In this section we provide new decidability and complexity results for CQA under both 
the loosely-exact and the loosely-sound semantics with KDs and SFSKs. In the rest of the 
section we always denote by: 

• Q, a schema containing KDs and SFSKs only; 

• P, a possibly inconsistent database for Q; 

• g, a union of conjunctive queries without comparison atoms. 

• S e {loosely-exact, loosely-sound}. 

We first show that, in the aforementioned hypothesis, the size of each repair is finite. 

Definition 6 

Let S be a S-repair for V and i > be a natural number. We inductively define the sets B^ 

as follows: 

1. If i = 0,thenS" = Br\V. 

2. If i > 0, then B'^ C B — {B'^ U . . . U S*~^) is arbitrarily chosen in such a way that 
its facts are necessary and sufficient for satisfying all the INDs in constr{Q) that are 
violated in ;B° U ... U B'"^ 



Observe that B = U^>o B"- and that B' C\B^ ^% for each j ^ 



U 



Lemma 1 

Let S be a E-repair for V, then 

\. The key of each fact in B only contains values from dom(2?). 
2. \B\ is finite. 
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Proof 

(1) Let i > be a natural number. Let ri{ti) be a fact in S' such that there is an index 
j £ key (ri) for which t^ dom(;B'^). Let rj_i(fi_i) beoneofthe facts ini3*^^ thatforces 
the presence of rj(ii) in S* for satisfying some IND, say d. (Note that, by Definition |6l 
there must be at least one of such a fact because S' would otherwise violate condition 2, 
since ri{ti) would be unnecessary.) Moreover, since d is a safe FSK, then there must exist 
an index k e key{ri-i) such that f- ~ tf-i- Thus, ri_i(ii_i) contains a value being not 
in 6ovn{B^) inside its key as well as ri{ti). Since i has been chosen arbitrarily, then value 
tl has to be part of a fact of B^, which is clearly a contradiction. 

(2) Since, the key of each fact in B can only contain values from dom(B°), and |dom(B'')| < 
|S°|-awherea = max{arity{g) : g e names (Q)}, then \B\ < |names(^)|-|dom(B°)|" < 
\names{g)\ ■ {a ■ |S°|)" < \names{g)\ ■ (a ■ |2?|)". D 

We next characterize representative databases for S-repairs. 

Definition 7 

Let i3 be a S-repair for T). We denote by homo(Z5) the (possibly infinite) set of databases 

defined in such a way that B' G homo(B) if and only if: 

• B' can be obtained from B by replacing each value (if any) that is not in dom(P) 
with a value from F — dom(I?); and 

• none of the values in F — dom(2?) occurs twice in B'. 

Finally, we denote by /le.g' : dom(Z5') — > dom(S) the function (homomorphism) as- 
sociating values in dom(;B') with values in dom(Z5), where /iB.e'(a) = a, for each 
aGdom(2?)ndom(i3')- □ 

Note that, since (by Lemma [T]i the key of each fact in B only contains values from 
dom(P),then|i3'| = |S| holds. 

For example, if Z? = {p(l,ei,e2), g(2,£2,ei)} with dom(2?) = {1,2} and key (p) ~ 
key{q) = {1}, then all of the following databases are in homo(S): {p(l,ei,£3), 9(2, £2, £4)}, 

{p(l,£4,£2), g(2,£3,£i)} and {p(l,£5,£6), g(2,£7,£8)}. 

Lemma 2 

If B is a E-repair for V, then each B' G homo(;B) also is. 

Proof 

Let B' G homo(;B). First of all, we prove that B' is consistent w.rt. Q. In particular, 
since the key of each fact in B only contains values from dom(X') (by Lemma [U, then 
B' cannot violate any KD (by Definition |7]i; Moreover, since each IND has to be satisfied 
through values of a key (by definition of safe FSKs), and since the key of each fact in B 
only contains values from dom(X') (by Lemma [U, then B' cannot violate any IND (by 
Definition I?); 

We now prove that B' is a repair, first for the loosely-sound semantics and then for the 
loosely-exact semantics. 

[loosely-sound] If S = loosely-sound, then observe that B' HV — B r\V,hy definition 
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of homo(S). Thus, if B' was consistent but not a loosely-sound repair there would exist a 
loosely-sound repair B" such that B" DV D B' nV = B DV. Contradiction. 

[loosely-exact] If S = loosely-exact, then assume that ;B is a loosely-exact repair but B' 
(although consistent w.r.t. Q) is not. By definition, there must be a loosely-exact repair B" 
such that B" Q V C B' Q T). In particular, we distinguish three cases: 

(1) B" -V = B' -VanAV-B" clV- B' 

(2) B"-VcB'-VandV-B"^V- B' 

(3) B" -V^B' -Va.nAV-B" (ZV- B' 

Case 1: Since, by Definition |2l for each fact in B there is a fact in B' with the same 
key, if we could add the facts in B" — B' to B' without violating any KD, then such facts 
could also be added to B without violating any KD. Moreover, if we could add to B' the 
facts in B" — B' without violating any IND, then such facts could be also added to B 
preserving consistency. This follows by the definition of safe FSKs (because each IND 
has to be satisfied through values of a key), by Lemma [T] (because the key of each fact in 
a loosely-exact repair only contains values from dom(X')) and, by Definition |7] (because 
for each fact in B' there is a fact in B with the same key and with the same values from 
dom(P)). Consequently, we could add all the facts in B" — B'ioB preserving consistency. 
But this is not possible since B is a loosely-exact repair. 

Case 2: Since in B' we have unnecessary facts (those in B' — B") or equivalently the 
facts in B" do not violate any IND, then the corresponding facts in B do not violate any 
IND by Lemma[T]and by Definition|2] Consequently, if each fact / e B, such that there is 
a fact f'^B' — B" that is homomorphic to /, was removed from B, then we would obtain 
a database preserving consistency and with a smaller symmetric difference than B. But this 
is not possible since S is a loosely-exact repair 

Case 3: Analogous considerations can be done by combining case 1 and case 2. D 

We next define the finite database V* having among its subsets a number of S-repairs 
sufficient for solving CQA. 

Definition 8 

Let c be a value in F — dom(P). Consider the largest (possibly inconsistent) database, say 
C, constructible on the domain dom(P) U {c} such that / G C iff the value c does not 
appear in the key of/. Let Afhea fixed set of values arbitrarily chosen from F — dom(X') 
whose cardinality is equal to the number of occurrences of c in C. We denote by T>* one 
possible database for Q obtained from C by replacing each occurrence of c with a value 
from Af in such a way that each value in J\f occurs exactly once in2?*.(|C| = |P*|.) D 

For example, if do m(D) = {1,2} and Q = {p} with arity{p) ~ 2 and key{p) = {!}, 
then C = {p(l,l),p(l,2),p(l,c),p(2,l),p(2,2),p(2,c)}. LetusfixA/- = {£i,£2}. 
Thus, V* has the following form: {p(l, 1), _p(l, 2), p(l, ei), _p(2, 1), p(2, 2),p{2, 62)}. 

Proposition 1 

The following hold: 

• 1^1 < E3ee(|dom(P)| + l)«"*2/(9) < Y.gegi'^rttyig) ■ \V\ + l)-"*v(s) 



1 
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Lemma 3 

If S is a E-repair for V, then there exists B' e homo(S) such that B' CV*. 

Proof 

B' can be obtained from B by replacing each fact r{ti) G B with the unique fact r{t2) G 
V* such that for each i G arity{r) either t^ = tl, if tl G dom(P), or t^ G TV, if 
tl ^ dom(2?). Moreover, note that, since B cannot contain two facts with the same key 
and since keys only have values from dom(X'), then each fact in T)* can replace at most 
one fact in B. Finally, B' G homo(;B) by Definitionl?] D 

Lemma 4 

Let i3 be a S-repair for T), B' G homo(B), g be a query, and t be a tuple of values from 

dom(2?). If t G ans{q, B'), then t G ans{q, B). 

Proof 

Let qi be one of the conjunctions in q,'\ft G ans{qi, B'), then there is a substitution fi' 
from the variables of qi to values in F such that B' |= qi{t). But since, by Definition |7] 
each fact in B' is univocally associated with a unique fact in B by preserving the values 
in dom(I?), and since all the extra values in B' are distinct, then there must also be a 
substitution /i such that B ^ qi{t)- In particular, let x be a variable in qi, we can define fi 
in such a way that ^{x) ~ hs.is' (^'(2^)), where h is the homomorphism from B' to B (see 
Definition I2I1. Clearly, if t G ans{qi,B') for at least one qi in q then t G ans{q,B') too 
and, consequently, t G ans{q,B) D 

The next theorem states the decidability of CQA under both the loosely-exact and the 
loosely-sound semantics with KDs and SFSKs only. 

Theorem 1 

Let ;B be a S-repair for V, qa. query, and t a tuple from dom(2?). Let B C 2"^ denote the 

set of all S-repairs contained in V* . Then, t G anss{q, G, 2?) iff t G ans{q, B) MB G B- 

Proof 

(=>) We have to prove that, if t G anss{q, G, T^), then t G ans{q, B) for each ;B G B, or 
equivalently if t ^ ans{q, B) for some S G B, then t ^ ans^{q, G, "D). This follows, by 
the definition of ans^{q, G, 'D) and from the fact that B only contains E-repairs. 

(<=) We have to prove that, if t G ans{q, B) for each S G B, then t G ans^{q, G, 2?). 
Assume that t G ans{q,B) for each i3 G B but t ans^{q,G,'C>)- This would entail 
that there is a repair Bq such that t <^ ans{q,Bo)- But, since t ^ ans{q,B') for each 
B' G homo(;Bo) (by Lemma|4]i, and since B n homo(So) always contains a repair, say B" 
(by Lemma[3]l, then we have a contradiction since t ^ ans{q, B") has to hold whereas we 
have assumed that t G ans{q, B) for each B eM. D 

Decidability and complexity results, under KDs and SFSKs only, follow from Theorem[T| 
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Corollary 1 

Let C^ be a global schema containing KDs and SFSKs only, 2? be a possibly inconsistent 
database for Q, g be a query, E G {loosely-exact^ loosely-sound}, and t be a tuple of 
values from dom(2?). The problem of establishing whether t G ans^{q, Q, V) is in I\S, in 
data complexity. 

Proof 

It suffices to prove that the problem of establishing whether t ^ ansj:{q, Q, V) is in S2- 
This can be done by (i) building V* , and (ii) guessing B ^ 2^ such that ;B is a S-repair 
and t ^ ans{q,B). Since, by Proposition [T] |P*| E 0{\T>\") where a = may.{arity{g) : 
g e names{Q)}, then step (i) (enumerate the facts of 2?*) can be done in polynomial time. 
Since checking that t ^ ans{q, B) can be done in PTIME. It remains to show that checking 
whether S is a S-repair can be done in coNP. 

[loosely-exact] If S = loosely-exact, this task corresponds to checking that there is no 
consistent B' CVUB such that B' QV C BqV, where this last task is doable in PTIME. 
[loosely-sound] If S = loosely-sound, this task corresponds to checking that there is no 
consistent B' C V* such that B' CiV D BnV, where this last task is doable in PTIME. 
Then the thesis follows. D 



2.5 Equivalence of CQA under loosely-exact and CM-complete semantics 

In this section we define some relevant cases in which CQA under loosely-exact and CM- 
complete semantics coincide. 

Lemma 5 

Given a database V for a schema Q, if S is a CM-complete repair for T), then it is a loosely- 
exact repair for V. 

Proof 

Suppose that S is a CM-complete repair for V (so, it is consistent w.r.t. Q), but it is not a 
loosely-exact one. This means that its symmetric difference with V can be still reduced. 
But, by definition of CM-complete semantics, B does not contain anything else but tuples 
in T), namely B — T) = %. So, the only way for "improving" it is to extend it with tuples 
from V. But, this is not possible because B is already maximal due to the CM-complete 
semantics, namely the addition of any other tuple would violate at least one IC. D 

Corollary 2 

a'nSloosely-exact{q,Q,'D) C ttUSCM -complete{q, Q y'D) 

Proof 

This directly follows by Lemma|5]in light of Definition[T] D 

Theorem 2 

There are cases where ansiooseiy-exact{q, Q-, ^) C anscM-compieteiq, Q-, ^) 
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Proof 

By IChomicki and Marcinkowski (2005] l, stating that the two semantics are different, and 

by CoroUarylH D 

Proposition 2 

Let ;B be a database consistent w.r.t. a set of ICs C. 

1. If C are DCs only, then each B' C B is consistent w.r.t. C, as well. 

2. If C are INDs only, then BUB' is consistent w.r.t. C for each B' consistent w.r.t. C. 

Proof 

(1) Deletion of tuples can not introduce new DCs violations. 

(2) Let r{t) be a fact in B'. Let di be an IND of the form n ^ r (r ^ n). Clearly, r(t) 
cannot violate di in any database because r is in the righthand side of di. In particular, 
r{t) cannot violate di in BU B'. Let d2 be an IND of the form r — > r2 (possibly, r — r2). 
Since r{t) does not violate d2 in B', then it cannot violate d2in BU B' . D 

Theorem 3 

Given a database V for a schema Q, let ;B be a loosely-exact repair for V, and B = BDV. 

There is a CM-complete repair B' C B for V if at least one of the following restrictions 

holds: 

I Q contains DCs only (no INDs); 
II Q contains INDs only (no DCs); 

III Q contains KDs and FKs only, and V is consistent w.rt. KDs; 

IV Q contains KDs and SFKs only; 

Proof 

Case I: By Proposition]!] since B is consistent w.r.t. DCs, then B C Bis consistent as well. 
Now, if B — V y^ $, then we would have a contradiction because B QV C B QT> would 
hold. Thus, B — T> = (d and so, B = B is already a CM-complete repair itself. 

Case II: Since there is no DC, there exists only one CM-complete repair, say B', obtained 
from V after removing all the facts violating INDs. Now, if B' was not contained in B, 
then, by Proposition |2] B' U B would still be consistent, that is a larger CM-complete 
repair. Contradiction. Finally B = B' . 

Case III: Since V is consistent w.rt. DCs, we have only one CM-complete repair, say 
B' , obtained from V after removing all the facts violating INDs. But, as in case II, if the 
set B' — B was nonempty, then we could add all these facts into B without violating any 
IND. Anyway, one of these facts, say /, could violate a DC due to a fact f in B — V. 
Now, note that /' is in B only for fixing an IND violation. But in this case, as we are 
only considering FKs, there would be no reason to have /' in B instead of/. So, we could 
(safely) replace / with /' in B and no KD would be violated as well as no FK. But this 
leads to a contradiction. So, there is no fact in B' which is not in B. 

Case IV: First of all, we observe that if B — V = $, then either S is a CM-complete repair 
or B is not a loosely-exact repair. So the statement holds. Now assume that B — T) ^ %. 
We distinguish three different cases: 
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(1) B is both consistent and maximal (it is a CM-complete repair); 

(2) B is consistent but not maximal (it is not a CM-complete repair); 

(3) B is inconsistent (it is not a CM-complete repair). 

In case (1), we have a contradiction because B is assumed to be a loosely-exact repair, 
but it does not minimize the symmetric difference with V since B QV C B QV. 

In case (2), we have again a contradiction because B is assumed to be a loosely-exact 
repair but it does not minimize the symmetric difference with V since there is a CM- 
complete repair B dB such that B GV c BgV. 

In case (3), we observe that since, by hypothesis, B is consistent, then the inconsistency 
of B arises, by Proposition |2] only due to INDs. Now, assume that (i) B contains a fact 
ri(ii); (ii) thereis anIND d of the form Vxy [r'i(xi) — > 3x2g r'2(x2) ]; (iii) there is no fact 
for r2 in B satisfying d. This means that a fact of the form r2 ( ^2 ) must be in S — V, where 

l-l — '■2 ■ 

^d ^d 

Now, we claim that there is no fact of the form r2 (ta) in P — Z5, where t^'' = t^" . Sup- 
pose that V — B contained such a fact r2{ts). Consider the new database (B U {r2(i3)}) — 
{r2{t2)}. This would necessarily be consistent because the addition of 72(^3) (after remov- 
ing r2(fc) as well) cannot violate any KD since d is an FK (remember that key{r2) = irfj), 
and cannot violate any IND since each IND d' of the form r2 — > ra is an SFK (remember 
that key{r2) 5 tt^ ). But this is not possible because B is assumed to be a loosely-exact 
repair, and {BU{r2{t3)}) — {r2{t2)} would improve the symmetric difference. This means, 
that each CM-complete repair cannot contain the tuple ri (ii) (this goes in the direction of 
the statement). 

Let us call B the consistent (w.rt. both KDs and SFKs) database obtained from B after 
removing all the facts violating some IND. It remains to show that there is no other fact 
inV — B such that B U {ri(ii)} does not violate any constraint. Assume that such a fact 
ri{ti) exists, then: 

- B U {ri(ii)} would not violate any IND; 

- S U (S' U {ri(ti)}) = S U {ri(ti)} would not violate any IND, by Proposition|2l 

- B D {ri{ti)} would violate some KD, since B isa loosely-exact repair 

Thus, there would necessarily be a fact in B, say ri{t2), being not in B , with the same key 
of Ti (ti). Since such a fact cannot stay inB — B because it does not violate any IND, then 
it must be in S — V. But this is not possible because we could replace ri{t2) by ri(ii) 
in B without violating any KD and also without violating any IND, since we are only 
considering SFKs. But since B is already a repair, this is clearly a contradiction. Finally, 
S is a CM-complete repair. D 

Corollary 3 

ansiooseiy-exactiq, G, V) = anscM-compiete{q, G, ^) in the following cases: 

- G contains DCs only (no INDs); 

- G contains INDs only (no DCs); 

- G contains KDs and FKs only, and V is consistent w.rt. KDs; 

- G contains KDs and SFKs only; 
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Proof 

This directly follows by both Theorem[3]and Lemma|5] in light of Definition[TJ D 

Proposition 3 

In general, Theorem|3]does not hold in case Q contains SFSKs and KDs only. 

Proof 

Consider a database containing two relations of arity 2, namely: r and s. Moreover, the 
schema contains the following ICs: key{r) = {1, 2}, and key{s) = {1} and r{X, Y) -^ 
s{X, Y). Note that, the last is a safe FSK. Suppose also that a DB 2? for this schema con- 
tains the following facts; r{a, b), s{a, c). The JooseJy-exact repairs are Bi = {s{a, c)} 
and B2 = {^{a, 6), s{a, b)}, but only the first one is also a CM-CompJete repair However, 
;B = ^2 n 2? = {f{<i, b)} is not a CM-complete repair (it is inconsistent). The only consis- 
tent database contained in B is the empty set that is not a CM-Complete repair (deletions 
are not minimized). D 



3 Computation of CQA via ASP 

In this section, we show how to exploit Answer Set Programming (ASP) JGelfond and Lifschitz 19881 
IGelfond and Lifschitz 199 il l for efficiently computing consistent answers to user queries 
under different semantic assumptions. ASP is a powerful logic programming paradigm al- 
lowing (in its general form) for disjunction in rule heads JMinker 1982| l and nonmono- 
tonic negation in rule bodies. In the following, we assume that the reader is familiar 
with ASP with aggregates, and in particular we adopt the DLV syntax ( IFaberet al. 20101 
ILeone et al. 20061 1. 

The suitability of ASP for implementing CQA has been already recognized in the Utera- 
ture (ILenzerini 2002llArenas et al. 2003t[Bertossi et al. 20051 IChomicki and Marcinkowski 2005T l. 
The general approaches are based on the following idea: produce an ASP program P whose 
answer sets represent possible repairs, so that the problem of computing CQA corresponds 
to cautious reasoning on P. One of the hardest challenges in this context is the automatic 
identification of a program P considering a minimal number of repairs actually relevant to 
answering user queries. 

In order to face these challenges, we first introduce a general encoding which unifies in 
a common core the solutions for CQA under the semantics considered in this paper Then, 
based on this unified framework, we define optimization strategies precisely aiming at 
reducing the computational cost of CQA. This is done in several ways: (i)hy casting down 
the original program to complexity-wise easier programs; (ii) by identifying portions of the 
database not requiring repairs at all, according to the query requirements; (Hi) exploiting 
equivalence classes between some semantics in such a way to adopt optimized solutions. 

We next present the general encoding first and, then, the optimizations. 

3.1 General Encoding 

The general approach generates a program li-cqa and a new query Qcqa obtained by rewrit- 
ing both the constraints and the query q in such a way that CQA reduces to cautious rea- 
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soning on Hcqa and qcqa- Recall that a union of conjunctive queries in ASP is expressed as 
a set of rules having the same head predicate with the same arity. 

In what follows, we first present how to generate Hcga and qcqa and then formally prove 
under which hypothesis cautious reasoning on such Hcqa and qcqa corresponds to CQA. 

Given a database V for a schema Q and a query q on Q, the ASP program TLcqa is created 
by rewriting each IC belonging to constr{Q) and q as follows: 

Denial Constraints. Let E G {CM-complete, loosely-sound, loosely-exact}. For each DC 
of the form Vxi, . . . ,Xm ^[gi{xi) A ... A (?m(x,n) A a{xi, . . . ,Xj„)] in constr{Q), insert 
the following rule into Ucqa'- 

• fff (xi) V ■ • • V 5,'=„(x™) :- 51 (xi), • ■ • , 5™(x„), cr(xi, . . . ,x™)- 

This rule states that in presence of a violated denial constraint it must be guessed the 
tuple(s) to be removed in order to repair the database. 

Inclusion dependencies. Let S = {CM-complete, loosely-exact}. For each IND d in 
constr{Q) of the form Vxy [gi{xi) — > 13x23 92(^2) ], add the following rules into Ifcga: 



• 



fff(xi) 
• fff(xi) 
5f(xi) 



- ffl(xi), #COUnt{x23 : 5|(X2)} = #COUnt{x23 : ff2(x2)}- if |X23| > 

- ffl(xi), ff|(x2)- 

- 51 (xi), not c?2(x2)- if IX23I = 



The first rule states that a tuple of gi must be deleted iff either all the tuples in 52 pre- 
viously referred to by gi via d have been deleted due to the repairing process, or there 
is no tuple in g2 referred to by gi via d. (This is done by comparing the total count of 
tuples in g2 and (?|). Observe that if there is a cyclic set of INDs, the set of rules gener- 
ated by this rewriting would contain recursive aggregates. Their semantics is described in 
(IFaberet al. 201 Oi l. The latter two rules replace the first one in the special case of |x23 1 = 0. 

Repaired Relations. Let S G {CM-complete, loosely-sound, loosely-exact}. For each re- 
lation name g £ names{0), insert the following rule into Ucqa'- 

• ff''(x) :- ff(x), not g'={x)- 

Query rewriting. Build qcqa{x) from g(x) as follows: 

I. If E = loosely-sound, then apply onto q the perfect rewriting algorithm that deals 
with INDs described in Call et al. 2003blf^ . 



2. For each atom g{y) in q, replace g{y) by g^{y) 

The perfect rewriting introduced in ( ICali et al. 2003bl ) is intuitively described next. Given 
a query g(x) and a set of INDs, the algorithm iteratively computes a new query Q as fol- 
lows. Q is first initialized with q; then, at each iteration it carries out the following two 
steps: (1) For each conjunction cq' in Q, and for each pair of atoms gi, 32 in cq' that unify 
(i.e., for which there exists a substitution transforming gi into 92), ffi and 32 are substituted 

^ Observe that, when S = loosely-sound, INDs are not encoded into logic rules. 
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by one single unifying atom. (2) For each conjunction cq' in Q, and for each applicable 
IND d of the form gi ^ g such that g is in cq', it adds to Q a new conjunction cq" ob- 
tained from cq' by interpreting d as a rewriting rule on g, applied from right to left. The 
algorithm stops when no further modifications are possible on Q with the two steps above. 
The following theorems show how and when cautious reasoning on Hcqa and qcqa cor- 
respond to CQA. First we consider the CM-complete semantics. 

Theorem 4 

Let E = CM-complete, let I? be a database for a schema Q with arbitrary DCs and (possi- 
bly cycUc) INDs, and let gbe a union of conjunctive queries, t e ans-s{q, Q,^)) iff qcqa{t) 
is a cautious consequence of the ASP program V\Jlicqa- 

Proof 

We claim that Ilcga allows to consider only and all the repairs, exactly one per model. Let 
B"^ be a repair. In the following, we describe how to obtain a model containing for each 
relation, say g, exactly only and all the tuples of g that do not appear in B'^' . We collect such 
tuples in the new relation g'^, while we collect in g"^ only and all the tuples of g appearing 
in B''. For each relation, say g: 

(a) By the disjunctive rules (if any) involving g, of the form 

■••V3'=(x)V--- :- ••■, g(i), ■••, a(---,x,---)- 

we guess a set of tuples of g, collected in g^, that must not appear in B'^ . 

(b) Next, for each IND of the form .g(xi) — ?> gi{^2) (involving g in the left-hand side), 
we use the rule 

c/"(xi) :- c/(xi), #COUnt{x23 : gi{^2)} =#COUnt{x23 : 31 (xa)}- 

for deciding which tuples of g cannot appear in B'^' due to an IND violation. Note 
that in case |x23| = 0, the rule is rewritten without the #COUnt aggregate. 

(c) Finally, by the rule 3'' (x) :— (;(x), not (;^(x) we obtain the repaired relations. 

Importantly, for computing the extension of each g^' we only exploit the minimality of 
answer sets semantics; later, the extension of each g' is computed. Observe that, by the 
splitting theorem JLifschitz and Turner 1994t Hcqa can be divided (split) into two parts . It 
is clear that, by construction, Hcqa has exactly one answer set per repair. Finally, the query 
is reorganized to exploit the repaired relations, and cautious reasoning does the rest. D 

Example 3 

Consider again Example|2] the program (and the query built from q{X) :— m{X)) under 

the CM-complete semantics obtained for it, is: 

. e'=(X„Z„)Ve^(X„X;:):-e(X„X„), e{X,,X;^), X^^X'^- 

. m'^iXc) :- m{Xc), #COUnt{X; : e^iX^, X!^)} ^ #COm{{X^ : e{Xc,X^)}- 

. e-{X,,Xn) :- e{X,,Xn), not e = (X„Z„)- 

. m^'(Xc) :— m{Xc), not m'^{Xc)- 

• QcqaiXc) :- w-''(Xc)- 

When this program is evaluated on the database we obtain four answer sets. It can be 
verified that, all the answer sets contain m''('ei') and m''('e2'), (i.e., they are cautious 
consequences of flc^a) and, thus, 'el ' and 'e2' are the consistent answers to the query. D 
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Theorem 5 

Let S = loosely-sound, let P be a database for a schema Q with KDs (and exactly one 
key for each relation) and (possibly cyclic) NKC INDs, and let g be a union of conjunctive 
queries without comparison atomo t G anss{q,G,'D) iff qcqa{t) is a cautious conse- 
quence of the ASP program T> U Hcqa- 

Proof 

Considerations analogous to the CM-complete case can be drawn. Disjunctive rules guess a 
minimal set of tuples to be removed, whereas the perfect rewriting algorithm allows to deal 
with NKC INDs. Observe that, the separation theorem introduced in (ICali et al. 2003bl l 
shows that INDs can be taken into account as if the KDs where not expressed on Q; in 
particular, it states that it is sufficient to compute the perfect rewriting q' of q and evaluate 
q' on the maximal subsets of V consistent with KDs. In our case, these are computed by 
the part of Hcqa dealing with KDs, whereas the separation is carried out by renaming each 
g in q' by g^ . D 

The general encoding for the loosely-exact semantics is inherently more complex than 
the ones for loosely-sound and CM-complete, since both tuple deletions and tuple inser- 
tions are subject to minimization. As a consequence, we tackled the loosely-exact encoding 
by considering that there are common cases in which CQA under the loosely-exact seman- 
tics and the CM-complete semantics actually coincide (see Corollary [3]l. These cases can 
be easily checked and, thus, it is possible to handle the loosely-exact semantics with the 
encoding defined for the CM-complete case. 

Theorem 6 

Let E = loosely-exact, I? be a database for a schema Q such that one of the following 

holds: 

- Q contains DCs only (no INDs); 

- Q contains INDs only (no DCs); 

- Q contains KDs and FKs only, and V is consistent w.r.t. KDs; 

- Q contains KDs and SFKs only; 

Let g be a union of conjunctive queries, t £ ans^{q, Q, V) iff qcqa(t) is a cautious conse- 
quence of the ASP program V U Hcqa- 

Proof 

Follows from Corollary |3] and Theorem|5] D 

3.2 Optimized Solution 

The strategy reported in the previous section is a general solution for solving the CQA 
problem but, in several cases, more efficient ASP programs can be produced. First of all, 
note that the general algorithm blindly considers all the ICs on the global schema, includ- 
ing those that have no effect on the specific query. Consequently, useless logic rules might 

^ Recall that equalities are expressed in terms of variables having the same name. 
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be produced which may slow down program evaluation. Then, a very simple optimization 
may consist of considering relevant ICs only. However, there are several cases in which 
the complexity of CQA stays in PTIME; but disjunctive programs, for which cautious rea- 
soning becomes a hard task jEiter et al. 19971 1, are generated even in presence of denial 
constraints only. This means that the evaluation of the produced logic programs might 
be much more expensive than required in those "easy" cases. In the following, we provide 
semantic-specific optimizations aiming to overcome such problems for the settings pointed 
out in TheoremHl Theorem|5] and Theorem|6] 

Given a query q and an atom g in q, we define the set of relevant indices of g in q, say 
relevant{q, g) in such a way that an index i in [\..arity{g)] belongs to relevant{q, g) if 
at least one of the following holds for an occurrence g{Xi , . . . , X„) of g in q: 



• 



Xi is not existentially quantified (it is a free variable, it is an output variable of g); 



• Xi is involved in some comparison atom (even if it is existentially quantified); 



• 



Xi appears more than once in the same conjunction; 



• Xi is a constant value; 

If g does not appear in q, we say that relevant{q, g) = 0; 

In the following, we denote by tt a set of indices. Moreover, given a sequence of variables 
X and a set TT C {1, . . . , |x|}, we denote by x'^ the sequence obtained from x by discarding 
a variable if its position is not in tt. Finally, given a relation name g, a set of indices tt and 
a label £ we denote by g^"^{i^) an auxiliary atom derived from g, marked by i, and using 
only variables in x'^. 

S = loosely-sound. The objective of this optimization is to single out, for each relation 
involved by the query, the set of attributes actually relevant to answer it and apply the 
necessary repairs only on them. As we show next, this may allow both to reduce (even 
to zero) the number of disjunctive rules needed to repair key violations and to reduce the 
cardinality of relations involved in such disjunctions. 

Given a schema Q and a query q, perform the following steps for building the program 
Ucga and the query Qcqa- 

1 . Apply the the perfect rewriting algorithm that deals with INDs described in JCali et al. 2003bl) . 

2. Let Q be the union of conjunctive queries obtained from q after Step 1. For each 
g e names{Q), build the sets 

7r|j ~ relevant{Q, g) tt^ = 7r|j U key(g) 

These two sets capture the fact that a key attribute is relevant for the repairing process, but 
it may not be strictly relevant for answering the query. 

Observe that the perfect rewriting dealing with INDs must be applied before singling 
out relevant attributes. In fact, q may depend, through INDs, also on attributes of relations 
not explicitly mentioned in it. However, in the last step of this algorithm the rewriting of 
the query is completed by substituting each relation in the query with its repaired (and 
possibly reduced) version. 

3. For each g G names{Q) such that -k^j^ ^ % and key{g) ^ 7r|,, add the following 
rules into Ilc^a: 
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• <?^'--^(x-:'):-<7(x). 

Vi G 7r| - key{g) 

• g'-^ix'^R) :- g''-^s{x''s), not g^-^'six'^s). 

Observe that if there exists at least one relevant non-key attribute for g, the repairing pro- 
cess can not be avoided; however, violations caused by irrelevant attributes only (i.e, not 
in 7r|) can be ignored, since the projection of g on 7r| is still safe and sufficient for query 
answering purposes. 

4. For each g G names {Q) such that 7r|j, 7^ and key{g) D 7r|j,, add the following rule 

into Ucqa- 

. g^--n{x^R):~g{x). 

Observe that, if the relevant attributes of g are a subset of its key, the repair process of g for 
key violations through disjunction can be avoided at all. In fact, the projection of g on n^ 
is still safe and sufficient for query answering purposes. Moreover, for the same reason, it 
is not needed to take all the key of g into account. 

5. For each atom of the form ^(x) in Q, replace g{x) by g^'^rt{x^'<). 

E = CM-compIete. For the optimization of the CM-complete semantics, we exploit a 
graph which is used to navigate the query and the database in order to single out those 
relations and projections actually relevant for answering the query. Moreover, it allows to 
identify possible cycles generated by ICs which must be suitably handled; in fact, acychc 
ICs induce a partial order among them and this information can be effectively exploited for 
the optimization. On the contrary cyclic ICs must be handled in a more standard way. 

Given a schema Q and a query q, build the directed labelled graph Gq = {N , A) as 
follows: 

• N = {q} U names{Q)\ 

• (517 ff2j c) G ^ iff c is a DC in constr{Q) involving both gi and g2\ 

• {gi, (72, d) G A iff d is an IND in constr{Q) of the form gi — > 32; 

• ((?, g,e) ^ A iff g appears in a conjunction of q. 

Perform the following steps for building program Ticqa'- 

1 . Visit Gq starting from node q\ 

2. Discard unreachable nodes and update the sets A^ and A; 

3. Partition the set A^ in {Ncf, Nncf) in such a way that a node n belongs to Ncf if it is 
not involved in any cycle (g always belongs to Ncf). Contrariwise, a node n belongs 
to Nncf if it is involved in some cycle. 

4. For each node g E N — {q} compute the sets 

^R == (U(9i,3,d)gA t^r) U relevant{q, g); 

Q Q 

^S = ^R 

Otherwise, 
here tt^ is the set of relevant variable indices of g, and 7r| adds to tt^ the key of g. 



Iff U key{g), only if g has exactly one primary key as DCs; 7r| = 
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Observe that Steps 1-4 implement a pre-processing phase in which relevant relations and 
their relevant indices are singled out, and each relevant relation is classified as cycle free 
or non cycle free. 

5. For each node g G Ncf, if g has only one key as DCs, then add the following rules 

into Ucqa- 

d-\ d-\ di. di. 

where: 

- A; > is the number of arcs in Gq labelled by INDs, and outgoing from g; 

- the pair (^, x) is either (r, R) or [sr^ S), according to whether key{g) 3 tt^ 
or not, respectively. Intuitively, if key{g) D tt^ holds, then the repair g^"^"- of 
g can be directly computed; otherwise the computation must first go through 
a semi -reparation step for computing g^'^'^s. Intuitively, this semi-reparation 
step collects those tuples that violate no IND of the form g -^ gi, but that must 
be anyway processed in order to fix some key violation (see Steps 6 - 10). 

- atom(7j " is in the body of the first rule (1 < i < fc) only if both (g, 3^, di) G 
A, and di is an IND of the form g{x) — >• gi{xi). This atom is just a projection 

ofg^Hx:-). 

6. For each node g £ Ncf if g has only one primary key as DCs, and key{g) C 7r|j, 
and g has incoming arcs only from q, and all the relevant variables of g w.r.t. q are 
in the head of q, and each occurrence of g in q contains all of its relevant variables, 
then add the following rules into Ucqa by considering that the key of g is defined by 
rules of the form Vxi,X2 -^[g{xi) A g{x2) Axl ^ x^]: 

. 5^-^-(x^^) :- 5^^-^'(xi-p, g''-^H€n, xl ^ xl- Vz e 4 - key{g) 

• £/'-'^«(xi") :- 5/'"-'^s(x^s), not g'^-^'^xl''). 

7. For each node g G Ncf if g has only one primary key as DCs, and key{g) ^ tt^, 
and case 6 does not apply, then add the following rules into Ucqa by considering that 
the key is defined by rules of the form, Vxi,X2 -i[g(xi) A g{x2) Ax\ ^ X2]: 

. 3--s«'') V5^-"-'(x2°') :- ff^'^-"'(xi'-), g'^'^'H^P), xl ^x^- 

Vi G 7r| — key{g) 

• g'-^ixl") :- g''-''Hxl''), not 3'="'^;' (x^'^. 

Observe that, in this case, disjunctive rules are defined only on the set of relevant 
indices that are not in the key and that each g'^'^s contains only the projection of 
deleted tuples on the set 7r|. 

Here, Steps 5-7 handle relations for which a key is defined and are classified as cycle free. 
In particular, if key{g) D tt^ holds, key reparation can be avoided at all (and thus disjunc- 
tive rules too); otherwise a semi-reparation step is required, but Step 6 identifies further 
cases in which even if key reparation is needed, disjunction can be still avoided. Finally, 
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Step 7 handles all the other cases. Importantly, through Steps 5-7 we take into account 
only the minimal projections of involved relations in order to reduce as much as possible 
computational costs (and even disjunctive rules) not considering irrelevant attributes. 

8. For each node g £ Nncf add the following rules into licqa'- 

d d 

• ^''(x) :- ff(x), not g{ "(x/). 

for each IND d of the form (;(x) -^ 3i(xi) such that there is no cycle in Gq 
involving both gi and g; 

• ^'^(x) :- ^(x), #C0Unt{xi3 : 5f (xi)} = #C0Unt{xi3 : c/i(xi)}- 

for each IND d of the form Vxy [ ^(x) — > 3x23 51 (xi) ] such that gi G Nncf\ 

• ff'(xi) V 3'=(X2) :- 3(xi), ^(xa), x'l ^ x^- Vi G tt 
where tt = {1, . . . , arity{g)} — key{g) and the key of g is defined by DCs of 
the formVxi,X2 ^[g{xi) A g{x2) Ax^ ^ x^]; 

• 5'^''^«(x^«) :— g{x), not ^'^(x). 

if there is at least one node in Ncf with an arc to g, or g appears in q; 

9. ForeachDCof theformVxi,...,x„ ^[51 (xi) A ... A 5m(x„) A cr(xi, . . . ,x„)] 
involving at least two different relation names (entailing that each gi G Nncf), add 
the following rules into Ilc^a: 

• fff (xi) V • • • V 5f„(x„) :- 51 (xi), • • • , 3™(x,„), cr(xi, . . . ,x™)- 

Steps 8 and 9 handle non cycle free relations; the repairing process in this case mimics the 
standard rewriting, but projects relations on the relevant attributes whenever possible. 

10. For each node g G Ncf if g is involved in DCs that do not form a primary key, then 
add the following rules into llcga: 

. ff-(x) :-^ff(x), 5r«\x^«'), ..., gr^Axl""). 

• gr^ (x?" ) :- gr^ (xl" )■ V* G [l..k] s.t. 4' D 4- 

• 5'^(xi) V •■• V3'=(x„) :- g'""(xi),---,3''''(x„),crd(xi,...,x„)- Vd 

• g'^"^"{x.'^") :— (^'"'(x), not g'^{x.). 

where: 

- A; > is the number of arcs, labelled by INDs, outgoing from g; 

d^ 

- atom g^ " is in the body of the first rule {1 < i < k) iff both {g, gi, di) G A 
and di is an IND of the form g{x) — >■ gi(xi); 

- (iisaDCoftheformVxi,...,x„ -.[g(xi) A . . . A 5(x„) A crd(xi, . . . ,x™)] 

Step 10 handles the special case in which there is no key for a relation but denial constraints 
are defined (only) on it. 

11. For each atom of the form g(x) in q, replace g{x) by (;'"'^«(x'^n). 
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Example 4 

Consider again Example [U suppose to extend the global schema by adding the relation 
c{code, name) which represents the list of customers, where code is the primary key of c. 
Moreover, suppose that we ask for the query q{Xc, Xn) :— c{Xc, Xn), e{Xc, Xn) retriev- 
ing the customers that are also employees of the bank. In this case, after building the graph 
Gq it is easy to see that m is unreachable (so it is discarded) and that both c and e comply 
with the requirements described at Steps 5 and 6 of the optimized algorithm. Consequently, 
the optimized program under the CM-complete semantics is: 

e--i'2(X„ X„) :- e(X„ X„). c--i'2(X„ X„) :- c(X„ X„). 



e--i'2(X„X„), note^-''^X„X^). 
c^'^-i'2(X„X„), not c^-i'2(X„X„). 

r-1,2/ 



c'-i'2(X„X„) 

qcqa\Xc,Xn) :— c"^ ' (Xc,X„), e^ ' {Xc, Xn) 

Note that, since both e and c are not affected by IND violations, and they have no irrelevant 
variables, the semi-reparation step cannot actually discard tuples. However, the obtained 
program is non-disjunctive and stratified. Thus, it can be evaluated in polynomial time 
(ILeone et al. 2006l l. 

In this case, the only answer set of the program contains the consistent answers to the 
original query. D 



E = loosely-exact. In Section lSTI we proved that there are common cases in which CQA 
under the loosely-exact semantics and the CM-complete semantics actually coincide. As a 
consequence, in these cases, all the optimizations defined for the CM-complete semantics 
apply also to the loosely-exact semantics. 



4 Experiments 

In this section we present some of the experiments we carried out to assess the effectiveness 
of our approach to consistent query answering. 

Testing has been performed by exploiting our complete system for data integration, 
which is intended to simplify both the integration system design and the querying activ- 
ities by exploiting a user-friendly GUI. Indeed, this system both supports the user in de- 
signing the global schema and the mappings between global relations and source schemas, 
and it allows to specify user queries over the global schema via a QBE-like interface. 
The query evaluation engine adopted for the tests is DLV^^ JTerracina et al. 2008l l cou- 
pled, via ODBC, with a PostgreSQL DBMS where input data were stored. DLV^^ is 
a DLP evaluator born as a database oriented extension of the well known DLV system 
(ILeone et al. 2006l l. It has been recently extended for dealing with unstratified negation, 
disjunction and external function calls. 

We first address tests on a real world scenario and then report on tests for scalability 
issues on synthetic data. 
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Fig. 1. INFOMIX database. 



4.1 Tests on a real world scenario 



Data Set. We have exploited the real-world data integration framework developed in the 
INFOMIX project (IST-2001-33570) (ILeone et al. 2005l l which integrates data from a real 
university context. In particular, considered data sources were available at the University 
of Rome "La Sapienza". These comprise information on students, professors, curricula and 
exams in various faculties of the university. 

There are about 35 data sources in the application scenario, which are mapped into 12 
global schema relations with 20 GAV mappings and 21 integrity constraints. We call this 
data set Infomix in the following. Figure[T]reproduces the main characteristics of the global 
database: each node corresponds to a global relation showing its arity and key. An edge 
between ri and r2 labelled by ri[I] C r2[J] indicates an IND of the form Vxy [ ?"i(xi) ^> 
3x23 r2(x2) ] where / and J are the positions of xy in Xi and X2, respectively; the arc is 
labelled with the attributes of a and b involved in the IND. Observe that there are cyclic 
INDs involving teaching, exam_record and professor. 

Besides the original source database instance (which takes about 16Mb on DBMS), we 
obtained bigger instances artificially. Specifically, we generated a number of copies of 
the original database; each copy is disjoint from the other ones but maintains the same 
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data correlations between instances as the original database. This has been carried out by 
mapping each original attribute value to a new value having a copy-specific prefix. 

Then, we considered two further datasets, namely Infomix-x-lO and Infomix-x-SO stor- 
ing 10 copies (for a total amount of 160Mb of data) and 50 copies (800Mb) of the original 
database, respectively. It holds that Infomix C Infomix-x-lO C Infomix-x-SO. 

Compared Methods and Tested Queries. In order to assess the characteristics of the pro- 
posed optimizations, we measured the execution time of different queries with (i) the stan- 
dard encoding (identified as STD in the following), (ii) a naive optimization obtained by 
only removing relations not strictly needed for answering the queries (OPTl in the fol- 
lowing), and (Hi) the fully optimized encoding presented in Section [3] (OPT2 in the fol- 
lowing). Each of these cases has been evaluated for the three semantics considered in this 
paper. In order to isolate the impact of our optimizations, we disabled other optimizations 
(like magic sets) embedded in the datalog evaluation engine. Clearly, such optimizations 
are complementary to our own and might further improve the overall performances. 
Tested queries are as follows: 

Ql(Xl) :- course (X2, XI) , plan.data (PL, X2 , .) , 

student.course.plan (PL, "0 908 9903" ,.,., .) . 
Q2(X1) :- university (XI,.) . 

Q3(X1,X2,X3) :- university.degree (XI , X2 ) , f acuity (X2 ,., X3) . 
Q4(X1,X2,X3) :- student (S, ., XI ,.,.,.,.) , enrollment (S, .,.) , 

exam.record(S,., ., X2,X3,., .) , S == "09089903". 
Q5(X1,X2) :- student.r (SI,., XI, .,.,.,.) , exam.record.r (SI , C, .,.,.,.,.) , 

student.r(S2,.,X2, .,.,.,.) , exam.record.r (S2,C, .,.,.,.,.) , 

SI == "09089470", S1<>S2. 
Q6(X1,X2,X3) :- student (XI, .,.,.,.,.,.) , exam.record (XI, ., ., X2 , X3, ., .) , 

XI == "09089903". 

Observe that Q2 involves key constraints only, Ql, and Q3 involve both keys and acyclic 
INDs; specifically, Q3 involves a SFK while Ql involves NKC INDs. Finally, Q4, Q5 and 
Q6 involve keys and cyclic NKC INDs. 

Results and discussion. All tests have been carried out on an Intel Xeon X3430, 2.4 GHz, 
with 4 Gb Ram, running Linux Operating System. We set a time limit of 120 minutes 
after which query execution has been killed. Figures|2]and|3]show obtained results for the 
loosely-sound and the CM-complete semantics. It is worth recalling that, as we pointed out 
in Section[32] optimizations for the loosely-exact semantics are inherent to the equivalence 
classes to the CM-complete semantics discovered in this paper. As a consequence, we 
tested this semantics only on queries Q2 and Q3 for which such equivalence holds. Then, 
since the execution times of the optimized encoding coincide with the CM-complete graphs 
for queries Q2 and Q3, we do not report specific figures for them. 

Analyzing the figures, we observe that: the proposed optimizations do not introduce 
computational overhead and, in most cases, transform practically untractable queries in 
tractable ones; in fact, for all the tested queries the execution time of the standard rewriting 
exceeded the time limit. OPTl helps mostly on the smallest data set; in fact for Infomix- 
x-lO it shows some gain in 33% of cases and only in two cases for Infomix-x-50. 

As for the comparison among the optimized encodings, we can observe that if INDs 
are not involved by the query (Q2) the loosely-sound and the CM-complete optimizations 
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Fig. 2. Query evaluation execution times for the loosely-sound semantics. 

have the same performances; this confirms theoretical expectations. When acyclic INDs 
are involved (Ql, Q3), the loosely-sound optimization performs slightly better because 
the CM-complete must choose the tuples to be deleted due to IND violations, whereas 
the loosely-sound semantics just works on the original data. Finally, when involved INDs 
are cyclic (Q4, Q5, Q6) the performance of the CM-complete optimization further degrades 
w.r.t. the loosely-sound one because recursive aggregates must be exploited to choose dele- 
tions and, this, increases the complexity of query evaluation. 



4.2 Scalability analysis w.r.t. the number and kind of constraint violations 

Since, in the real world scenario emerged that the CM-complete semantics is more af- 
fected than the loosely sound one from the kind of involved constraints, we carried out a 
scalability analysis on this semantics, whose results are reported next. 

We considered a synthetic data set composed of three relations named ri, r2, and r^ over 
which we imposed different sets of ICs in order to analyze the scalability of our methods 
depending on the presence of keys and/or in presence/absence of acyclic and cyclic INDs. 
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Fig. 3. Query evaluation execution times for the CM-Complete semantics. 



In particular, we imposed the following key constraints; key{r2) = {1,2}, key{r3) = {1}, 
and we experimented with three different sets of INDs: NOINCL = 0, ACYCLIC = 

{ri(Xi,X2,X3,X4) -> r2(X2,X5,X3,X6), ri(Xi,X2,X3,X4) ^ r3(Xi,X5,X6,X7)} 

and CYCLIC = ^CKCL/C U {r2(Xi,X2,X3,X4) -^ ri(X5,X6,X7,X2)}. The em- 
ployed query is: query{Xl,X^) :- ri(Xl, X2, X3, X4), r2(X2, X3, X5, X6)? Wehave 
randomly generated synthetic databases having a growing number of key violations on ta- 
ble r2. The generation process progressively adds key violations to r2 by generating pairs 
of conflicting tuples; after an instance of r2 is obtained, tables ri and r-^ are generated by 
taking values from r2 in such a way that INDs are satisfied. In addition, for each tuple of 
r3 a key-conflicting tuple is generated. In order to assess the impact of the number of INDs 
violations, for each database instance DB^, containing x key violations on table r2, we 
generated a DB^-IQ instance where the 10% of tuples is (randomly) removed from tables 
ri and r^ (causing INDs violations). We have generated six database instances per size 
(number of key violations on table r2), and plotted the time (averaged over the instances of 
the same size) in Figured 

In detail. Figure IHa) shows the results for incrementally higher KD violations with 
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no IND violations. Both standard and optimized encodings have been tested. Figure |4jb) 
compares the optimized encoding only, when the percentage of IND violations is 0% or 
10%. Observe that, in general, even when there is no initial IND violation, the KD repairing 
process may induce some of them. 

The analysis of these figures shows that even if cyclic INDs are generally harder, their 
scaling is almost the same as the acyclic ones. On the contrary, in the absence of INDs 
the optimization may boost the performances (see the flat line in Figure|4ta)). Figure Ub) 
points out that when the number of IND violations increases, the performance may im- 
prove. This behavior is justified by the fact that tuple deletions due to IND repairs may, in 
their turn, remove KD violations. This reduces the number of disjunctions to be evaluated. 



5 Related work and concluding remarks 



From the 90ies - when the founding notions of CQA ( Bry 1997 1, GAV mapping dGarcia-Molina et al. 19971 
ITomasic etal. 199^IGoh et al. 1999l l, and database-repair ( I Arenas et al. 1999] l were intro- 
duced - data integration jLenzerini 2002l l and inconsistent databases JBertossi et al. 2005T l 
have been studied quite in depth. 

Detailed characterizations of the main problems arising in a data integration system 
have been provided, taking into account different semantics, constraints, and query types 
(ICali et al . 2003a. Call et al. 2003b; Arena s et al. 20031 IChomicki and Marcinkowski 20051 
IGrieco et al. 2005MFuxman and Miller 20071 EiFer et al. 20081 1. 

This paper provides a contribution in this scenario by extending the decidabihty bound- 
aries for the loosely-exact semantics (as called in lCali et al. 2003al but firstly introduced by 
[Arenas etal. 19991 ) and the loosely-sound semantics, in case of both KDs and SFSK INDs. 

A first proposal of an unifying framework for CQA in a Data Integration setting is pre- 
sented in JCaU et al. 2005l l using first-order logic; it considers different semantics defined 
by interpreting the mapping assertions between the global and the local schemas of the 
data integration system. A common framework for computing repairs in a single database 
setting is proposed in (lEiter et al. 2008l i: it covers a wide range of semantics relying on the 
general notion of preorder for candidate repairs, but only universally quantified constraints 
are allowed. Moreover, the authors introduce an abstract logic programming framework to 
compute consistent answers. Finally, the authors propose an optimization strategy called 
factorization that, as will be clarified below, is orthogonal to our own. 
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This paper provides a contribution in this setting since it unifies different semantics, 
as in JCali et al. 20051) and jEiter et al. 20081 1. but also provides an algorithm that, given 
a retrieved database, a user query q, and a semantics, automatically composes an ASP 
program capable of computing the consistent answers to q. In particular, our ASP-rewriting 
offers a natural, compact, and direct way for encoding even hard cases where the CQA 
problem belongs to the n| complexity class. 

Theoretical studies gave rise to concrete implementations most of which were con- 
ceived to operate on some specific semantics and/or constraint types. (I Arenas et al. 19991 
ICah et al. 2002l|Greco and Zumpano 2000llGreco et al. 2001tlCari et al. 2003b, Arenas et al. 20031 
IChomicki et al. 2004allCah et aL 20041 IChomicki et al. 2004btlLembo 2004IIGrieco et al. 20031 
ILeone et al. 2005tlFuxman et al. 20031IFuxman and Miller 2007l l. As an example, in jLeone et al. 20051 ) 
only the loosely-sound semantics was supported. In this paper, we provide both a uni- 
fied framework based on ASP, and a complete system supporting (i) all the three afore- 
mentioned significant semantics in case of conjunctive queries and the most commonly 
used database constraints (KDs and INDs), (ii) specialized optimizations, and (iii) a user- 
friendly GUI. 

Another general contribution of our work comes from a novel optimization technique 
that, after analyzing the query and localizing a minimal number of relevant ICs, tries to 
"simplify" their structure to reduce the number of database repairs - as they could be 
exponentially many jArenas et al. 20011 . Such technique could be classified as "vertical" 
due to the fact that it reduces (whenever possible) the arity of each active relation (with 
the effect, e.g., of decreasing the number of key conflicts) without looking at the data. 
It is orthogonal to other "horizontal" approaches, such as magic-sets (IFaber et al. 20071 ) 
and factorization ( lEiter et al. 20081 ) which are based on data filtering strategies. In partic- 
ular, a system exploiting ASP incorporating magic-set techniques for CQA is described 
in (IMarileo and Bertossi 20101 ). Other approaches complementary to our own are based on 
first-order rewritings of the query JArenas et al. 19991 IChomicki and Marcinkowski 20021 
ICali et al. 20"03bl IGrieco et al. 20051 IFuxman and Miller 20071 ). 

The combination of our optimizations with such approaches, and further extensions of 
decidability boundaries for CQA are some of our future line of research. 
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